A socket is one endpoint of a communication channel used by programs to pass data back and forth locally or across the Internet. Sockets have two primary properties controlling the way they send data: the address family controls the OSI network layer protocol used and the socket type controls the transport layer protocol.

Python supports three address families. The most common, AF_INET, is used for IPv4 Internet addressing. IPv4 addresses are four bytes long and are usually represented as a sequence of four numbers, one per octet, separated by dots (e.g., 10.1.1.5 and 127.0.0.1). These values are more commonly referred to as “IP addresses.” Almost all Internet networking is done using IP version 4 at this time.

AF_INET6 is used for IPv6 Internet addressing. IPv6 is the “next generation” version of the Internet protocol, and supports 128-bit addresses, traffic shaping, and routing features not available under IPv4. Adoption of IPv6 continues to grow, especially with the proliferation of cloud computing and the extra devices being added to the network because of Internet-of-things projects.

AF_UNIX is the address family for Unix Domain Sockets (UDS), an inter-process communication protocol available on POSIX-compliant systems. The implementation of UDS typically allows the operating system to pass data directly from process to process, without going through the network stack. This is more efficient than using AF_INET, but because the file system is used as the namespace for addressing, UDS is restricted to processes on the same system. The appeal of using UDS over other IPC mechanisms such as named pipes or shared memory is that the programming interface is the same as for IP networking, so the application can take advantage of efficient communication when running on a single host, but use the same code when sending data across the network.

Note:

THE AF_UNIX constant is only defined on systems where UDS is supported.

The socket type is usually either SOCK_DGRAM for message-oriented datagram transport or SOCK_STREAM for stream-oriented transport. Datagram sockets are most often associated with UDP, the user datagram protocol. They provide unreliable delivery of individual messages. Stream-oriented sockets are associated with TCP, transmission control protocol. They provide byte streams between the client and server, ensuring message delivery or failure notification through timeout management, retransmission, and other features.

Most application protocols that deliver a large amount of data, such as HTTP, are built on top of TCP because it makes it simpler to create complex applications when message ordering and delivery is handled automatically. UDP is commonly used for protocols where order is less important (since the messages are self-contained and often small, such as name look-ups via DNS), or for multicasting (sending the same data to several hosts). Both UDP and TCP can be used with either IPv4 or IPv6 addressing.

Looking up Hosts on the Network

socket includes functions to interface with the domain name services on the network so a program can convert the host name of a server into its numerical network address. Applications do not need to convert addresses explicitly before using them to connect to a server, but it can be useful when reporting errors to include the numerical address as well as the name value being used.

To find the official name of the current host, use gethostname()


In [1]:
import socket
 
print(socket.gethostname())


scott-t460

Use gethostbyname() to consult the operating system hostname resolution API and convert the name of a server to its numerical address.


In [2]:
import socket

HOSTS = [
    'apu',
    'pymotw.com',
    'www.python.org',
    'nosuchname',
]

for host in HOSTS:
    try:
        print('{} : {}'.format(host, socket.gethostbyname(host)))
    except socket.error as msg:
        print('{} : {}'.format(host, msg))


apu : [Errno -2] Name or service not known
pymotw.com : 66.33.211.242
www.python.org : 151.101.40.223
nosuchname : [Errno -2] Name or service not known

For access to more naming information about a server, use gethostbyname_ex(). It returns the canonical hostname of the server, any aliases, and all of the available IP addresses that can be used to reach it.


In [3]:
import socket

HOSTS = [
    'apu',
    'pymotw.com',
    'www.python.org',
    'nosuchname',
]

for host in HOSTS:
    print(host)
    try:
        name, aliases, addresses = socket.gethostbyname_ex(host)
        print('  Hostname:', name)
        print('  Aliases :', aliases)
        print(' Addresses:', addresses)
    except socket.error as msg:
        print('ERROR:', msg)
    print()


apu
ERROR: [Errno -2] Name or service not known

pymotw.com
  Hostname: pymotw.com
  Aliases : []
 Addresses: ['66.33.211.242']

www.python.org
  Hostname: python.map.fastly.net
  Aliases : ['www.python.org']
 Addresses: ['151.101.40.223']

nosuchname
ERROR: [Errno -2] Name or service not known

Use getfqdn() to convert a partial name to a fully qualified domain name.


In [5]:
import socket

for host in ['scott-t460', 'pymotw.com']:
    print('{:>10} : {}'.format(host, socket.getfqdn(host)))


scott-t460 : scott-t460
pymotw.com : apache2-zoo.george-washington.dreamhost.com

When the address of a server is available, use gethostbyaddr() to do a “reverse” lookup for the name.


In [7]:
import socket

hostname, aliases, addresses = socket.gethostbyaddr('10.104.190.53')

print('Hostname :', hostname)
print('Aliases  :', aliases)
print('Addresses:', addresses)


Hostname : scott-t460
Aliases  : []
Addresses: ['10.104.190.53', '172.17.0.1']

Finding Service Information

In addition to an IP address, each socket address includes an integer port number. Many applications can run on the same host, listening on a single IP address, but only one socket at a time can use a port at that address. The combination of IP address, protocol, and port number uniquely identify a communication channel and ensure that messages sent through a socket arrive at the correct destination.

Some of the port numbers are pre-allocated for a specific protocol. For example, communication between email servers using SMTP occurs over port number 25 using TCP, and web clients and servers use port 80 for HTTP. The port numbers for network services with standardized names can be looked up with getservbyname().


In [8]:
import socket
from urllib.parse import urlparse

URLS = [
    'http://www.python.org',
    'https://www.mybank.com',
    'ftp://prep.ai.mit.edu',
    'gopher://gopher.micro.umn.edu',
    'smtp://mail.example.com',
    'imap://mail.example.com',
    'imaps://mail.example.com',
    'pop3://pop.example.com',
    'pop3s://pop.example.com',
]

for url in URLS:
    parsed_url = urlparse(url)
    port = socket.getservbyname(parsed_url.scheme)
    print('{:>6} : {}'.format(parsed_url.scheme, port))


  http : 80
 https : 443
   ftp : 21
gopher : 70
  smtp : 25
  imap : 143
 imaps : 993
  pop3 : 110
 pop3s : 995

To reverse the service port lookup, use getservbyport().


In [9]:
import socket
from urllib.parse import urlunparse

for port in [80, 443, 21, 70, 25, 143, 993, 110, 995]:
    url = '{}://example.com/'.format(socket.getservbyport(port))
    print(url)


http://example.com/
https://example.com/
ftp://example.com/
gopher://example.com/
smtp://example.com/
imap2://example.com/
imaps://example.com/
pop3://example.com/
pop3s://example.com/

The number assigned to a transport protocol can be retrieved with getprotobyname().


In [10]:
import socket


def get_constants(prefix):
    """Create a dictionary mapping socket module
    constants to their names.
    """
    return {
        getattr(socket, n): n
        for n in dir(socket)
        if n.startswith(prefix)
    }


protocols = get_constants('IPPROTO_')

for name in ['icmp', 'udp', 'tcp']:
    proto_num = socket.getprotobyname(name)
    const_name = protocols[proto_num]
    print('{:>4} -> {:2d} (socket.{:<12} = {:2d})'.format(
        name, proto_num, const_name,
        getattr(socket, const_name)))


icmp ->  1 (socket.IPPROTO_ICMP =  1)
 udp -> 17 (socket.IPPROTO_UDP  = 17)
 tcp ->  6 (socket.IPPROTO_TCP  =  6)

Looking Up Server Addresses

getaddrinfo() converts the basic address of a service into a list of tuples with all of the information necessary to make a connection. The contents of each tuple will vary, containing different network families or protocols.


In [13]:
import socket


def get_constants(prefix):
    """Create a dictionary mapping socket module
    constants to their names.
    """
    return {
        getattr(socket, n): n
        for n in dir(socket)
        if n.startswith(prefix)
    }


families = get_constants('AF_')
types = get_constants('SOCK_')
protocols = get_constants('IPPROTO_')

for response in socket.getaddrinfo('www.python.org', 'http'):

    # Unpack the response tuple
    family, socktype, proto, canonname, sockaddr = response

    print('Family        :', families[family])
    print('Type          :', types[socktype])
    print('Protocol      :', protocols[proto])
    print('Canonical name:', canonname)
    print('Socket address:', sockaddr)
    print()


Family        : AF_INET
Type          : SOCK_STREAM
Protocol      : IPPROTO_TCP
Canonical name: 
Socket address: ('151.101.52.223', 80)

Family        : AF_INET
Type          : SOCK_DGRAM
Protocol      : IPPROTO_UDP
Canonical name: 
Socket address: ('151.101.52.223', 80)

Family        : AF_INET6
Type          : SOCK_STREAM
Protocol      : IPPROTO_TCP
Canonical name: 
Socket address: ('2a04:4e42:d::223', 80, 0, 0)

Family        : AF_INET6
Type          : SOCK_DGRAM
Protocol      : IPPROTO_UDP
Canonical name: 
Socket address: ('2a04:4e42:d::223', 80, 0, 0)

IP Address Representations

Network programs written in C use the data type struct sockaddr to represent IP addresses as binary values (instead of the string addresses usually found in Python programs). To convert IPv4 addresses between the Python representation and the C representation, use inet_aton() and inet_ntoa().


In [14]:
import binascii
import socket
import struct
import sys

for string_address in ['192.168.1.1', '127.0.0.1']:
    packed = socket.inet_aton(string_address)
    print('Original:', string_address)
    print('Packed  :', binascii.hexlify(packed))
    print('Unpacked:', socket.inet_ntoa(packed))
    print()


Original: 192.168.1.1
Packed  : b'c0a80101'
Unpacked: 192.168.1.1

Original: 127.0.0.1
Packed  : b'7f000001'
Unpacked: 127.0.0.1

The four bytes in the packed format can be passed to C libraries, transmitted safely over the network, or saved to a database compactly.

The related functions inet_pton() and inet_ntop() work with both IPv4 and IPv6 addresses, producing the appropriate format based on the address family parameter passed in.


In [15]:
import binascii
import socket
import struct
import sys

string_address = '2002:ac10:10a:1234:21e:52ff:fe74:40e'
packed = socket.inet_pton(socket.AF_INET6, string_address)

print('Original:', string_address)
print('Packed  :', binascii.hexlify(packed))
print('Unpacked:', socket.inet_ntop(socket.AF_INET6, packed))


Original: 2002:ac10:10a:1234:21e:52ff:fe74:40e
Packed  : b'2002ac10010a1234021e52fffe74040e'
Unpacked: 2002:ac10:10a:1234:21e:52ff:fe74:40e

In [ ]: